15 research outputs found

    Learning to Search in Reinforcement Learning

    Get PDF
    In this thesis, we investigate the use of search based algorithms with deep neural networks to tackle a wide range of problems ranging from board games to video games and beyond. Drawing inspiration from AlphaGo, the first computer program to achieve superhuman performance in the game of Go, we developed a new algorithm AlphaZero. AlphaZero is a general reinforcement learning algorithm that combines deep neural networks with a Monte Carlo Tree search for planning and learning. Starting completely from scratch, without any prior human knowledge beyond the basic rules of the game, AlphaZero managed to achieve superhuman performance in Go, chess and shogi. Subsequently, building upon the success of AlphaZero, we investigated ways to extend our methods to problems in which the rules are not known or cannot be hand-coded. This line of work led to the development of MuZero, a model-based reinforcement learning agent that builds a deterministic internal model of the world and uses it to construct plans in its imagination. We applied our method to Go, chess, shogi and the classic Atari suite of video-games, achieving superhuman performance. MuZero is the first RL algorithm to master a variety of both canonical challenges for high performance planning and visually complex problems using the same principles. Finally, we describe Stochastic MuZero, a general agent that extends the applicability of MuZero to highly stochastic environments. We show that our method achieves superhuman performance in stochastic domains such as backgammon and the classic game of 2048 while matching the performance of MuZero in deterministic ones like Go

    Playing Atari with Deep Reinforcement Learning

    Full text link
    We present the first deep learning model to successfully learn control policies directly from high-dimensional sensory input using reinforcement learning. The model is a convolutional neural network, trained with a variant of Q-learning, whose input is raw pixels and whose output is a value function estimating future rewards. We apply our method to seven Atari 2600 games from the Arcade Learning Environment, with no adjustment of the architecture or learning algorithm. We find that it outperforms all previous approaches on six of the games and surpasses a human expert on three of them.Comment: NIPS Deep Learning Workshop 201

    Leaf age-dependent effects of foliar-sprayed CuZn nanoparticles on photosynthetic efficiency and ROS generation in <i>Arabidopsis thaliana</i>

    Get PDF
    Young and mature leaves of Arabidopsis thaliana were exposed by foliar spray to 30 mg L&minus;1 of CuZn nanoparticles (NPs). The NPs were synthesized by a microwave-assisted polyol process and characterized by dynamic light scattering (DLS), X-ray diffraction (XRD), and transmission electron microscopy (TEM). CuZn NPs effects in Arabidopsis leaves were evaluated by chlorophyll fluorescence imaging analysis that revealed spatiotemporal heterogeneity of the quantum efficiency of PSII photochemistry (&Phi;PS&Iota;&Iota;) and the redox state of the plastoquinone (PQ) pool (qp), measured 30 min, 90 min, 180 min, and 240 min after spraying. Photosystem II (PSII) function in young leaves was observed to be negatively influenced, especially 30 min after spraying, at which point increased H2O2 generation was correlated to the lower oxidized state of the PQ pool. Recovery of young leaves photosynthetic efficiency appeared only after 240 min of NPs spray when also the level of ROS accumulation was similar to control leaves. On the contrary, a beneficial effect on PSII function in mature leaves after 30 min of the CuZn NPs spray was observed, with increased &Phi;PS&Iota;&Iota;, an increased electron transport rate (ETR), decreased singlet oxygen (1O2) formation, and H2O2 production at the same level of control leaves.An explanation for this differential response is suggested
    corecore